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The firmness of the mango fruit is one of the internal physical properties that 
can show its quality. Unfortunately, non-destructive methods to measure this 
are not yet available. In the current study, we develop a calibration model 
using near infrared spectroscopy to predict the physical properties (firmness) 
of the mango cultivar Arumanis (Mangifera indica cv. Arumanis) via 
machine learning. Spectral data were acquired using the fourier transform 
near-infrared (FTNIR) benchtop with a wavelength range of 1000 to 
2500 nm. Multivariate spectra analysis based on machine learning, including 
principal component regression (PCR), partial least squares regression 
(PLSR), and support vector machine regression (SVMR), was utilized and 
compared to estimate the firmness of fresh mangos. The results obtained 
show that the prediction of machine learning by PLSR is better than that of 
SVMR and PCR for the prediction of mango firmness. The coefficient 
correlation of calibration (rc) and validation (rcv), the root means square 
error of calibration (RMSE-C) and validation (RMSE-CV), and the ratio of 
prediction to deviation (RPD) were 0.941, 0.382 kgf, 0.920, 0.472 kgf, and 
2.556, respectively. The general results satisfactorily indicate that near 
infrared spectroscopy technology integrated with an appropriate machine 
learning algorithm has optimistic results in determining the firmness of 
mango non-destructively. 
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1. INTRODUCTION 


The freshness of mangoes after harvest is highly correlated with internal physical properties, 
especially firmness, and is also a major quality attribute of mango. The firmness of the mango fruit will 
decrease with the postharvest storage period and the conditions on product quality. In general, the texture of 
the fruit will show firmness correlated with changes in the structure of the cell wall of the fruit [1]-[4]. 
Therefore, internal physical properties such as the firmness of the mango are the best parameters that indicate 
ripeness. This will increase the value of the mango product itself when it is sold to consumers. Therefore, the 
firmness of the product should be known early from the post-harvest, shipping, and even retail stages. 
Unfortunately, the firmness measurement still has to be done destructively. As a result, measurements can 
only be carried out by sampling from many batches and sometimes do not represent satisfactory 
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measurement results for each mango fruit. Therefore, nondestructive determination of parameters such as the 
firmness of mango fruit is really crucial for discovery. 

Non-destructive evaluation of the internal physical properties is important to sort and grade the 
quality of agricultural products. One of the technologies that has been studied for the evaluation of 
agricultural products is the use of near infrared spectroscopy. The application of near infrared spectroscopy, 
which is integrated with chemometrics, offers convenience to evaluate the internal properties of agricultural 
products because of its quickness, accuracy, and non-destructive method. Several research results have also 
discussed and highlighted this, such as for mango with some cultivars. Mangoes with Kent cultivar have been 
evaluated for their internal properties using near infrared spectroscopy to predict vitamin C and total acidity 
[5]-[7]. Rivera, et al. [8] conducted a study using the Manila mango cultivar to evaluate the mechanical 
damage that occurred to the mango using near infrared spectroscopy. Purwanto, ef al. [9] evaluated the 
soluble solid and acidity of the Gedong Gincu cultivar mango using near-infrared spectroscopy. dos Santos 
Neto, et al. [10] reported using near-infrared spectroscopy to determine the maturity of Palmer mango 
cultivars Palmer. Unfortunately, studies related to internal physical properties, especially firmness of 
Arumanis mango fruit cultivars using near-infrared spectroscopy, have not been reported to date. 

Chemometrics is used to find the relationship between the predictor (wavelength near-infrared 
spectral data) and the response, such as mechanical or chemical properties of the sample. This method 
involves the disciplines of mathematics and statistics, which are mixed into an algorithm to find the 
relationship. The most popular algorithms for processing near-infrared spectral data are PCR and PLSR [11]— 
[13]. Both algorithms are reported to have excellent performance in building spectral data calibration models 
if there is a linear response between the predictor and the parameter constituents of the sample. However, 
when the predictor influences the response, it will indirectly lead to nonlinear spectral data, so machine 
learning-based algorithms must overcome this. Popular machine learning algorithms processing near-infrared 
spectral data include SVMR, ANN, and CNN [14]-[16]. 

This study produces a near infrared spectroscopy calibrations model to accurately estimate the 
physical properties of mango (firmness), which can work as reliably as the standard wet laboratory method. 
In the present work, two linear approaches and one nonlinear approach (based on a machine learning 
algorithm) were used and compared to obtain the best model. 


2. METHOD 

A total of 79 mango cultivars of Arumanis were obtained from farmers in Indramayu, Indonesia. 
Mangoes are obtained at 84 to 105 days after flowering. Additionally, mangoes are stored between 3 and 9 
days under ambient temperature conditions. This variation in harvest age and storage range was carried out to 
obtain variations in the firmness of mango fruit when measured. In total, 237 data are obtained from the 
variation of treatment. The mango fruit was scanned to obtain spectral data, and then a firmness test was 
carried out at a nearby time. 

The near infrared spectra were acquired with NIRFlex N-500 (fiber optic solid), integrated with the 
NIRCal 5.2 database. The spectra ranged from 1000 to 2500 nm with a scanning resolution of 0.4 nm. Each 
spectrum was averaged from 32 reflectance measurements. All measurements were taken at room 
temperature (25° C). Spectra for each mango were scanned at the bottom, middle, and top sides. 

The reference firmness of mango was taken directly after acquisition of spectra. Each mango was 
tested for firmness at the point (bottom side, middle and top side) of near-infrared scanning. The firmness of 
the samples was measured on the basis of the resistance of the fruit to the probe of a rheometer with a 
diameter of 5 mm. Measurements were made using a rheometer (CR-300). The firmness measurement is 
established with a maximum load of 10 kg, a compression depth of 10 mm, and a load decrease speed of 60 
mm/minute. The unit of measurement of firmness in this study is kg-force (kgf). 

Raw near infrared spectra data were treated using several techniques to achieve reliability. The 
treatment of near-infrared spectral data used in this study aims to transform the data to make them suitable 
for an analysis whose activities include normalization, scaling, transformations, and removal of any outliers 
in the data [17]. Treatment techniques include mean normalization, multiplicative scatter correction, standard 
normal variate, first derivative Savitzky—Golay, and second derivative Savitzky—Golay. 

Using enhanced spectral data, calibration models based on near infrared spectra were established for 
firmness prediction. Three machine learning prediction algorithms were compared, including principal 
component regression (PCR), partial least squares regression (PLSR), and support vector machine regression 
(SVMR). In this study, 237 samples were divided into calibration datasets containing 166 samples and 71 
samples for the validation samples test. 

The performance of the model was analyzed based on the calibration and validation results 
according to the coefficient correlation of calibration (r-) and validation (ry), the root means a square error of 
calibration (RMSE-C) and cross-validation (RMSE-CV) [18]-[21]. Furthermore, the prediction-to-deviation 
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ratio (RPD) was evaluated [22]-[24]. This study determined the best firmness prediction model for whole 
mangoes that focused on the RPD value. The chemometric analysis in this study was performed using 
Unscrambler software (X10.1). 


3. RESULTS AND DISCUSSION 
3.1. Descriptive statistics 

Box plots that illustrate the distribution of laboratory reference values for mango firmness for near- 
infrared spectroscopy calibration and validation data sets are presented in Figure 1. The reference firmness of 
mangoes for the calibration data set is 166 with a range of 0.24 to 4.12 kgf (mean= 1.5 kgf, SD= 1.13 kgf). 
The validation of the reference firmness data set is 71 with values ranging from 0.31 to 4.42 kgf (mean= 1.40 
kgf, SD= 1.21 kgf). The range of reference values was not as large for the calibration set compared to the 
validation set. 
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Figure 1. Summary of firmness distributions within near infrared spectroscopy calibration and 
validation datasets 


3.2. Characteristics of near infrared spectral data 

The near-infrared raw spectra of Arumanis mango with diffuse reflectance (log 1/R) are depicted in 
Figure 2. These spectral lines indicate the existence of organic materials because they are formed when 
molecular bonds (O-H, C-H—O, C—O) interact with incoming radiation. These links are susceptible to 
fluctuations in vibrational energy, resulting in the formation of two distinct vibration patterns, stretch 
vibration and bend vibration. As a result of its tone combination and initial overtone, the near infrared 
spectral of mango in this study exhibits a diffuse reflectance range of O-H molecular bonds at wavelengths 
of 1418-1440 nm and 1920 nm. Furthermore, the absorption bands between 2120 and 2260 nm are associated 
with CHO compounds such as sugar and vitamin C. Organic acids are connected with additional absorption 
bands at around 1400, 1800, and 2100 nm. 
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Figure 2. Near infrared raw spectra of mango Arumanis 
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3.3. Results comparison of near infrared spectral calibration analysis 
3.3.1. Calibration model using PCR algorithms 

First, PCR was evaluated as the first linear method. The PCR method has been used for a long time 
in chemometric analysis, although it does not always give the best results; in the author's opinion, it is worth 
trying. Based on the results of the PCR calibration, the second derivative as spectral treatments provided 
optimum results for the calibration and validation of the firmness as shows in Table 1. It can be seen that the 
model calibration using PCR with raw data can give an RPD of 1.675 and will increase if spectral treatments 
are carried out before it reaches 2.020. Some diffuse reflectance maxima were observed in the loadings as 
shown in Figure 3, region of 1801 and 1950 nm associated with water. Peaks in the NIR range were observed 
at 1950 nm, related to first overtone O—H and N-H stretching; 1801 corresponding to first overtone C—H 
stretching, and 2304 influenced by combination stretching and banding the band C—H [25]. Good predictive 
performance occurs when the RPD increases as the prediction error decreases (standard deviation). As 
described in some studies, RPD in the range between 1.5 and 2 is included in the moderate model category, 
and 2 to 2.5 is included in the good model category [26]-[29]. 


Table 1. Near infrared spectroscopy calibration spectral using PCR algorithms 


Calibration Validation 

Spectral treatments RMSE-C ‘ RMSE-CV RPD 
Raw 0.775 0.713 0.802 0.720 1.675 
Mean Normalization 0.663 0.845 0.750 = 0.797 1.513 
Multiplicative scatter correction 0.758 0.736 0.800 0.723 1.666 
Standard normal variate 0.759 0.735 0.798 0.726 1.660 
First derivative 0.829 0.631 0.858 0.619 1.947 
Second derivative 0.828 0.632 0.869 0.597 2.020 
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Figure 3. Loadings for the 2nd derivative treatments followed by PCR algorithm 


The scatter plot between the firmness of the standard laboratory measurement (references) and the 
firmness predicted by the PCR method is shown in Figure 4. Using the best PCR method, the calibration 
model as shown in Figure 4(a) was built with 167 samples with an RMSE-C of 0.632 kgf. The correlation 
coefficient between the results of laboratory measurements and their predictions is 0.828. When this 
calibration model was validated with 71 samples as shown in Figure 4(b), it was found that the RMSE 
decreased to 0.597 kgf. At the same time, the correlation coefficient between the results of laboratory 
measurements and their predictions increased to 0.869. Therefore, in this study, we suggest using the second 
derivative spectra treatment method to determine the firmness prediction models of mango Arumanis using 
the PCR method. 


3.3.2. Calibration model using PLSR algorithms 

The second multivariate spectra analysis used in this study is (PLSR). This method is widely used 
and is always reported to provide the best performance in constructing predictions from near infrared spectra 
data. This is because PLSR can negotiate with irrelevant and noisy variable issues. However, PLSR also 
cannot work well, especially in the case of spectral data, as if the number of samples is much smaller than the 
number of variables. In addition, the selection of spectral treatments is also very important to improve the 
performance of the built model. The performance of the PLSR calibration model in the use of several spectral 
treatments is presented in Table 2. It can be seen that, RPD generated from the near infrared spectral 
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combined with PLSR has been able to produce satisfactory performance (RPD > 2) [30]—[32]. Moreover, 
applying spectral treatments as first derivatives on raw spectral data can increase the RPD to greater than 2.5. 
RPD with a category greater than 2.5, according to the research results of Murphy, ef al. [28], is included in 
the category of a good model for prediction. The loadings resulting from PLSR as shown in Figure 5 were 
used to identify the most important wavelengths influencing the analyzed samples. A strong effect in the 
1630, 1667, 1810, 2145, and 2316 could be observed. Peaks in the NIR range were observed at 1630 and 
1667 nm related to first overtone C—H stretching, and 2145 and 2316 were influenced by combination 
stretching and banding of the band C—H and N-H [25]. 
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Figure 4. The best PCR regression results to predict mango firmness, (a) calibration model and 
(b) cross-validation model 


Table 2. Near infrared spectroscopy calibration spectral using PLSR algorithms 


Calibration Validation 

Spectral treatments ; RMSE-C ; RMSE-CV RPD 
Raw 0.832 0.626 0.877 0.579 2.082 
Mean Normalization 0.837. 0.617 0.867 0.600 2.008 
Multiplicative scatter correction 0.853 0.589 0.876 0.581 2.075 
Standard normal variate 0.851 0.594 0.867 0.601 2.007 
1* derivative 0.941 0.382 0.920 0.472 2.556 
2" derivative 0.881 0.533 0.885 0.561 2.150 
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Figure 5. Loadings for the 1st derivative treatments followed by PLSR algorithm 


The scatter plot between the firmness of standard laboratory measurement (references) and the 
firmness predicted by the PLSR method is shown in Figure 6. The calibration model using the best PLSR 
method was built with 167 samples with an RMSE-C of 0.382 kgf as shown in Figure 6(a). The correlation 
coefficient between the results of laboratory measurements and their predictions is 0.941. When this 
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calibration model was validated with 71 samples, it was found that the RMSE increased to 0.472 kgf as 
shown in Figure 6(b). At the same time, the correlation coefficient between the results of laboratory 
measurements and their predictions decreased to 0.920. Therefore, in this study, we suggest using the first 
derivative spectra treatment method to determine the firmness prediction models of mango Arumanis using 
the PLSR method. 
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Figure 6. The best regression results of PLSR to predict mango firmness, (a) calibration model and (b) cross- 
validation model 


3.3.3. Calibration model using SVMR algorithms 

The nonlinear regression of multivariate spectral analysis used in this study is SVMR. The SVMR 
method is currently very popularly used to compare other multivariate spectra analysis methods, which are 
only based on linear regression. It should be noted that several research reports state that the relationship 
between near infrared spectra and targeted constituents to be modeled is not always linear. The origin of this 
non-linearity may vary widely and is difficult to identify. Also, the source of nonlinearity between near 
infrared spectral data and target constituents cannot be corrected by the spectral treatments and require 
special nonlinear adjustment method processes. This means that standard linear regression techniques such as 
PCR and PLSR are not always the best solution. The performance of the SVMR calibration model in the use 
of several spectral treatments is presented in Table 3. It can be seen that the RPD generated from the near 
infrared spectral combined with SVMR can produce various performances (1< RPD < 2). However, applying 
spectral treatments in the form of first derivatives on raw spectral data can increase the RPD to almost 2. The 
RPD achieved by SVMR in this study was included in the moderate performance category compared to the 
study results by Murphy, et al. [28]. 


Table 3. Near infrared spectroscopy calibration spectral using SVMR algorithms 


Calibration Validation 


Spectral treatments ‘ RMSE-C : RMSE-CV RPD 
Raw 0.663 0.884 0.627 0.393 1.284 
Mean Normalization 0.718 0.894 0.742 0.550 1.490 
Multiplicative scatter correction 0.794 0.745 0.752 0.566 1.517 
Standard normal variate 0.794 0.744 0.752 0.565 1.517 
1* derivative 0.920 0.468 0.863 0.745 1.980 
2" derivative 0.934 0.455 0.824 0.679 1.765 


The scatter plot between the firmness of standard laboratory measurement (references) and the 
firmness predicted by the SVMR method is shown in Figure 7. The calibration model using the best SVMR 
method was built with 167 samples with an RMSE-C of 0.468 kgf as shown in Figure 7(a). The correlation 
coefficient between the results of laboratory measurements and their predictions is 0.920. When this 
calibration model was validated with 71 samples, it was found that the RMSE decreased to 0.745 kgf as 
shown in Figure 7(b). At the same time, the correlation coefficient between the results of laboratory 
measurements and their predictions decreased to 0.863. Therefore, in this study, we suggest using the first 
derivative spectra treatment method to determine the firmness prediction models of mango Arumanis using 
the SVMR method. 
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Figure 7. The best regression results of SVMR to predict mango firmness, (a) calibration model and 
(b) cross-validation model 


3.4. The best near infrared spectroscopy calibration spectral 

The results of the comparison of the performance of the two linear approaches (PCR and PLSR) and 
one non-linear approach (SVMR) to predict the firmness of mango Arumanis are presented in Figure 8. 
Based on RMSE-C, RMSE-CV, and RPD, we can express both linear and nonlinear multivariate spectra 
analysis can predict the firmness of mango Arumanis with a moderate to good range. In this current study, we 
may organize the multivariate spectral analysis based on machine learning algorithm order based on its error: 
PLSR < SVMR < PCR. In this study, the performance of linear multivariate spectra analysis (PLSR) was 
better than that of nonlinear multivariate spectra analysis (SVMR). In another case, Cardoso and Poppi [33] 
reported that SVMR performed better than PLSR in handling near infrared data from commercial green tea 
blends. However, in the research results of Wang, et al. [19] found that near infrared spectroscopy combined 
with several multivariate spectra analyses to predict the firmness of pears obtained RPD in the range of 1.23 
to 1.60. 
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Figure 8. Comparison of the performance of three machine learning algorithm for the prediction 
firmness of mango 


4. CONCLUSION 

Near infrared spectroscopy calibration models were established to predict the physical properties of 
mango (firmness) using several machine learning algorithms. The findings of this study outline that near- 
infrared spectroscopy can predict the firmness of the mango cultivar Arumanis with a moderate to good 
degree of precision using PCR, PLSR, and SVMR (1-= 0.828-0.941, roy= 0.863-0.920, RMSE-C= 0.382- 
0.632 kgf, RMSE-CV= 0.472-0.597 kgf, RPD= 1.980-2.556). The calibration model developed in this 
investigation will allow for a more timely determination of the firmness of mangoes, where the firmness of 
mangoes is one of the parameters determining the quality of mangoes. The developed calibration model is 
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expected to be used by researchers and farmers to make the right decisions in the post-harvest handling of 
mangoes, especially the Arumanis cultivar. 
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